test: improve performance of test-crypto-timing-safe-equal-benchmarks #26237

Trott · 2019-02-21T11:31:57Z

Resetting require.cache() to Object.create(null) each time rather than
deleting the specific key results in a 10x improvement in running time.

Fixes: #25984
Refs: #26229

Checklist

make -j4 test (UNIX), or vcbuild test (Windows) passes
commit message follows commit guidelines

nodejs-github-bot · 2019-02-21T11:31:59Z

@Trott build started: https://ci.nodejs.org/blue/organizations/jenkins/node-test-pull-request-lite-pipeline/detail/node-test-pull-request-lite-pipeline/2686/pipeline

Trott · 2019-02-21T11:32:20Z

@bnoordhuis @not-an-aardvark @nodejs/crypto

Trott · 2019-02-21T11:34:26Z

(If this is a reasonable approach, this test can probably be moved out of pummel and start running again more regularly in CI.)

Trott · 2019-02-21T11:35:03Z

~~Pummel CI: https://ci.nodejs.org/job/node-test-commit-custom-suites/885/~~

Whoops, I used the wrong repo in the Parameters. Let's try again.

Pummel CI: https://ci.nodejs.org/job/node-test-commit-custom-suites/886/ ✅

Trott · 2019-02-21T11:37:17Z

Before this change:

$ time ./node test/pummel/test-crypto-timing-safe-equal-benchmarks.js 

real	0m56.422s
user	0m54.419s
sys	0m8.399s
$

With this change:

$ time ./node test/pummel/test-crypto-timing-safe-equal-benchmarks.js 

real	0m4.885s
user	0m4.689s
sys	0m1.418s
$

BridgeAR · 2019-02-21T12:45:53Z

I am somewhat surprised that this change has a positive impact.

@bmeurer @hashseed does using delete on an object cause it to become so slow as shown here? Or do you spot what is causing the performance difference in this case?

Trott · 2019-02-21T12:54:21Z

(The speed increase is also seen on CI. Before this change, test was timing out taking more than 2 minutes. With this change, the test takes 11 seconds. I agree with @BridgeAR that this result is surprising.)

bmeurer · 2019-02-21T13:11:18Z

delete puts the object into dictionary mode. But that's also what you get with Object.create(null). However Object.create(null) can be inlined into optimized code, whereas delete always goes to C++.

BridgeAR · 2019-02-21T13:17:14Z

@bmeurer is there a possibility to improve delete? The performance cliff is extreme in this case.

joyeecheung

LGTM with a nit

test/pummel/test-crypto-timing-safe-equal-benchmarks.js

Trott · 2019-02-21T13:26:13Z

Post-fixup Lite CI for linting: https://ci.nodejs.org/job/node-test-pull-request-lite-pipeline/2688/

Post-fixup Pummel CI to actually run the test: https://ci.nodejs.org/job/node-test-commit-custom-suites/887/

joyeecheung · 2019-02-21T13:27:41Z

BTW, this "fixes" the test by adding an optimization that's not really related to
#26229 (comment) - maybe we should just use vm.runInThisContext to run the script instead of requiring it and then resetting the cache? We don't actually have to load it as a CJS module for what is being tested here.

hashseed · 2019-02-21T14:26:52Z

@bmeurer @hashseed does using delete on an object cause it to become so slow as shown here? Or do you spot what is causing the performance difference in this case?

The reason for the original bad performance was actually a memory leak.

Even though we delete the module keyed by the file name from the cache, it is added to children on the cache entry for the main test file.

This is the (partial) content of require.cache at the first call to runOneBenchmark:

[Object: null prototype] {
  '/usr/local/google/home/yangguo/node/test/pummel/test-crypto-timing-safe-equal-benchmarks.js':
   Module {
     id: '.',
     exports: {},
     parent: null,
     filename:
      '/usr/local/google/home/yangguo/node/test/pummel/test-crypto-timing-safe-equal-benchmarks.js',
     loaded: false,
     children: [ [Module], [Module], [Module] ],
...
}

This is after 10:

[Object: null prototype] {
  '/usr/local/google/home/yangguo/node/test/pummel/test-crypto-timing-safe-equal-benchmarks.js':
   Module {
     id: '.',
     exports: {},
     parent: null,
     filename:
      '/usr/local/google/home/yangguo/node/test/pummel/test-crypto-timing-safe-equal-benchmarks.js',
     loaded: false,
     children:
      [ [Module],
        [Module],
        [Module],
        [Module],
        [Module],
        [Module],
        [Module],
        [Module],
        [Module],
        [Module],
        [Module],
        [Module],
        [Module] ],
...

By resetting require.cache entirely, the module actually get garbage-collected. You can visualize the difference by running with --trace-gc.

BridgeAR · 2019-02-21T14:31:05Z

@hashseed thanks for looking into it again.

Trott · 2019-02-21T14:52:13Z

I'd like to fast-track this to un-break the nightly CI job that runs pummel tests. Please 👍 here if you support fast-tracking, or leave a comment if you think that's unwarranted in this case.

not-an-aardvark

The change LGTM, although it might be worth running the test a bunch of times to make sure it's not more flaky as a result of this change.

Trott · 2019-02-21T19:13:37Z

The change LGTM, although it might be worth running the test a bunch of times to make sure it's not more flaky as a result of this change.

Running the test as it appears in this PR 1000 times: https://ci.nodejs.org/job/node-stress-single-test/2160/

EDIT: Canceled after 521 runs, 458 success, 63 failure. Oddly, I can't get a failure on my local machine at all. Let's see what happens with master branch on CI stress test.

Trott · 2019-02-21T19:21:52Z

Because we're already seeing a small-but-larger-than-expected number of failures on the stress test for this PR, here's a stress test against master but with the timeout bumped to 10 minutes. The idea is to see how often the test currently fails if it's given enough time. We know it will probably fail every time with a timeout, but if the timeout were larger, would the test be as flaky as the version in this PR? Let's find out. https://ci.nodejs.org/job/node-stress-single-test/2161/

Trott · 2019-02-21T20:58:30Z

@not-an-aardvark Any chance at all the test isn't flaky and it really isn't timing-safe? (But, oddly, only on the CI machine but not on my local laptop. ¯\(ツ)/¯)

not-an-aardvark · 2019-02-21T21:25:58Z

Any chance at all the test isn't flaky and it really isn't timing-safe? (But, oddly, only on the CI machine but not on my local laptop. ¯\(ツ)/¯)

I suppose it's possible, but it seems unlikely that this would have resulted from a V8/module-loading change given that the comparison function doesn't do much with V8/module-loading internals. Based on looking at the test failures, it seems like errors are happening in both directions (sometimes it takes longer when the two buffers are equal, and sometimes it takes longer when the two buffers are unequal) which doesn't seem like it would typically occur if it's not timing-safe.

I remember when I was originally trying to debug this, we were also encountering flakiness on some architectures but not others.

hashseed · 2019-02-21T21:34:50Z

Is it possible that the timing difference this test is intended to measure has been overshadowed and therefore masked by the GC time caused by the memory leak?

not-an-aardvark · 2019-02-21T21:35:11Z

If not, and the bug is in the test modifications in this PR, would Joyee's suggestion about using vm.runInThisContext() be a likely fix?

I'm not very familiar with vm.runInThisContext, but it seems like could be worth trying. My best guess is that the flakiness results in the engine sometimes optimizing a particular JS code path over another (e.g. the code path which creates intentionally unequal buffers, rather than the code path that creates intentionally equal buffers), and recompiling the function every time was helping to avoid that kind of noise.

If the original code had a memory leak, another possibility could be that some calls are getting interrupted by GC once in awhile and throwing off the timing.

hashseed · 2019-02-21T21:41:30Z

Note that previously, the code was not being recompiled on every run because it hit V8's internal compilation cache, even though it was deleted from the require cache.

Trott · 2019-02-21T21:56:29Z

@hashseed @not-an-aardvark On the GC hypothesis, sorry to ask a potentially-ignorant or naive question, but is this a correct understanding of what you're saying?: The memory leak means that GC happens from time to time, which will be unpredictable, thus causing timing to appear safe (because it is unpredictable) when it may not actually be? So it would cause the test to pass when perhaps it shouldn't? Or am I misunderstanding?

hashseed · 2019-02-21T22:06:11Z

I don't know what exactly the test asserts. Is it predictability or unpredictability? The memory leak will likely contribute to the latter.

not-an-aardvark · 2019-02-21T22:18:56Z

The test fails if it can find a statistically significant difference between comparison times for equal buffers and comparison times for unequal buffers, using crypto.timingSafeEqual. Unpredictability in comparison times would add statistical noise and reduce the likelihood that it can find a statistically significant difference, making the test more likely to pass. (The test tries to remove high outliers from the timing sample, which could potentially avoid noise caused by GC pauses, but it's not clear how effective this measure is.)

The test also asserts that there is a statistically significant difference between timings for equal/unequal buffers when using the timing-unsafe Buffer#equals, so if there was an extremely large amount of unpredictability then the test would start failing again. This type of failure would result in a different error message.

The failures here seem to indicate that there is a timing anomaly being introduced somewhere, although it seems likely that the anomaly is an artifact of the test setup rather than timing unsafety in crypto.timingSafeEqual.

Resetting require.cache() to `Object.create(null)` each time rather than deleting the specific key results in a 10x improvement in running time. Fixes: nodejs#25984 Refs: nodejs#26229

Trott · 2019-02-21T22:49:12Z

BTW, this "fixes" the test by adding an optimization that's not really related to
#26229 (comment) - maybe we should just use vm.runInThisContext to run the script instead of requiring it and then resetting the cache? We don't actually have to load it as a CJS module for what is being tested here.

@joyeecheung Any reason to avoid eval() in this situation rather than vm.runInThisContext() which will introduce marginally more complication?

…nchmarks Use eval() instead....

Trott · 2019-02-21T22:52:59Z

Changed it up to eval(). 🤞

Stress test: https://ci.nodejs.org/job/node-stress-single-test/2162/

Trott · 2019-02-21T23:08:06Z

Lite CI: https://ci.nodejs.org/job/node-test-pull-request-lite-pipeline/2704/

Trott · 2019-02-21T23:13:18Z

The eval version appears to run as quickly as the Object.create(null) version but does not at all appear to be flaky. (48 runs in a row on stress test so far with no failures. I'll let it get to at least 500 and probably 1000.) Re-reviews would be welcome at this point. Thanks for the patience, everyone.

joyeecheung · 2019-02-21T23:28:55Z

My original suggestion about using vm.runInThisContext() was based on that the previous implementation of the CJS loader used that method to perform the compilation. If eval() hits the compilation cache in similar ways without introducing unpredictable optimization that interferes with the timing being tested (@hashseed ?) then LGTM

Trott · 2019-02-22T02:58:25Z

Stress test results: 1000 out of 1000 runs passed. 🎉

Trott · 2019-02-22T02:59:35Z

Because this is a fair bit different from what was originally opened in this PR, a couple re-reviews or new reviews would be welcome.

Trott · 2019-02-22T03:14:44Z

And...now that we have green stress test and some re-reviews... would like to fast-track to unbreak CI, so I'm re-adding the fast-track label. Please 👍 here to fast-track.

Trott · 2019-02-22T04:23:10Z

Landed in eb9e6c0

Thanks, everyone!

Using `eval()` rather than `require()`'ing a fixture and deleting it from the cache results in a roughtly 10x improvement in running time. Fixes: nodejs#25984 Refs: nodejs#26229 PR-URL: nodejs#26237 Reviewed-By: Joyee Cheung <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Yang Guo <[email protected]> Reviewed-By: Ruben Bridgewater <[email protected]> Reviewed-By: Teddy Katz <[email protected]> Reviewed-By: Richard Lau <[email protected]>

Using `eval()` rather than `require()`'ing a fixture and deleting it from the cache results in a roughtly 10x improvement in running time. Fixes: #25984 Refs: #26229 PR-URL: #26237 Reviewed-By: Joyee Cheung <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Yang Guo <[email protected]> Reviewed-By: Ruben Bridgewater <[email protected]> Reviewed-By: Teddy Katz <[email protected]> Reviewed-By: Richard Lau <[email protected]>

Trott added the crypto Issues and PRs related to the crypto subsystem. label Feb 21, 2019

Trott requested a review from not-an-aardvark February 21, 2019 11:31

nodejs-github-bot added the test Issues and PRs related to the tests. label Feb 21, 2019

Trott mentioned this pull request Feb 21, 2019

performance regression #26229

Closed

joyeecheung approved these changes Feb 21, 2019

View reviewed changes

test/pummel/test-crypto-timing-safe-equal-benchmarks.js Outdated Show resolved Hide resolved

Trott added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Feb 21, 2019

cjihrig approved these changes Feb 21, 2019

View reviewed changes

addaleax approved these changes Feb 21, 2019

View reviewed changes

hashseed approved these changes Feb 21, 2019

View reviewed changes

BridgeAR approved these changes Feb 21, 2019

View reviewed changes

Trott added the fast-track PRs that do not need to wait for 48 hours to land. label Feb 21, 2019

not-an-aardvark approved these changes Feb 21, 2019

View reviewed changes

Trott removed the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Feb 21, 2019

test: improve performance of test-crypto-timing-safe-equal-benchmarks

3b626d2

Resetting require.cache() to `Object.create(null)` each time rather than deleting the specific key results in a 10x improvement in running time. Fixes: nodejs#25984 Refs: nodejs#26229

squash! test: improve performance of test-crypto-timing-safe-equal-be…

69655d7

…nchmarks Use eval() instead....

Trott force-pushed the fix-crypto-timing-test branch from 5e1d897 to 69655d7 Compare February 21, 2019 22:51

Trott added the flaky-test Issues and PRs related to the tests with unstable failures on the CI. label Feb 22, 2019

not-an-aardvark approved these changes Feb 22, 2019

View reviewed changes

richardlau approved these changes Feb 22, 2019

View reviewed changes

Trott added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Feb 22, 2019

Trott added the fast-track PRs that do not need to wait for 48 hours to land. label Feb 22, 2019

Trott closed this Feb 22, 2019

BridgeAR mentioned this pull request Feb 26, 2019

v11.11.0 proposal #26322

Merged

Trott deleted the fix-crypto-timing-test branch January 13, 2022 22:51

test: improve performance of test-crypto-timing-safe-equal-benchmarks #26237

test: improve performance of test-crypto-timing-safe-equal-benchmarks #26237

Conversation

Trott commented Feb 21, 2019

Checklist

nodejs-github-bot commented Feb 21, 2019

Trott commented Feb 21, 2019

Trott commented Feb 21, 2019

Trott commented Feb 21, 2019 • edited Loading

Trott commented Feb 21, 2019

BridgeAR commented Feb 21, 2019

Trott commented Feb 21, 2019

bmeurer commented Feb 21, 2019

BridgeAR commented Feb 21, 2019

joyeecheung left a comment

Choose a reason for hiding this comment

Trott commented Feb 21, 2019

joyeecheung commented Feb 21, 2019 • edited Loading

hashseed commented Feb 21, 2019 • edited Loading

BridgeAR commented Feb 21, 2019

Trott commented Feb 21, 2019

not-an-aardvark left a comment

Choose a reason for hiding this comment

Trott commented Feb 21, 2019 • edited Loading

Trott commented Feb 21, 2019

Trott commented Feb 21, 2019

not-an-aardvark commented Feb 21, 2019

hashseed commented Feb 21, 2019

not-an-aardvark commented Feb 21, 2019

hashseed commented Feb 21, 2019

Trott commented Feb 21, 2019

hashseed commented Feb 21, 2019

not-an-aardvark commented Feb 21, 2019

Trott commented Feb 21, 2019

Trott commented Feb 21, 2019

Trott commented Feb 21, 2019

Trott commented Feb 21, 2019

joyeecheung commented Feb 21, 2019

Trott commented Feb 22, 2019

Trott commented Feb 22, 2019

Trott commented Feb 22, 2019

Trott commented Feb 22, 2019

Trott commented Feb 21, 2019 •

edited

Loading

joyeecheung commented Feb 21, 2019 •

edited

Loading

hashseed commented Feb 21, 2019 •

edited

Loading

Trott commented Feb 21, 2019 •

edited

Loading